NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Baleen: ML Admission & Prefetching for Flash Caches

Wong, Daniel Lin-Kit; Wu, Hao; Molder, Carson; Gunasekar, Sathya; Lu, Sathya; Khandkar, Snehal; Sharma, Abhinav; Berger, Daniel S; Beckmann, Nathan; Ganger, Gregory R (February 2024, Usenix)

Flash caches are used to reduce peak backend load for throughput-constrained data center services, reducing the total number of backend servers required. Bulk storage systems are a large-scale example, backed by high-capacity but low-throughput hard disks, and using flash caches to provide a more cost-effective storage layer underlying everything from blobstores to data warehouses. However, flash caches must address the limited write endurance of flash by limiting the long-term average flash write rate to avoid premature wearout. To do so, most flash caches must use admission policies to filter cache insertions and maximize the workload-reduction value of each flash write. The Baleen flash cache uses coordinated ML admission and prefetching to reduce peak backend load. After learning painful lessons with our early ML policy attempts, we exploit a new cache residency model (which we call episodes) to guide model training. We focus on optimizing for an end-to-end system metric (Disk-head Time) that measures backend load more accurately than IO miss rate or byte miss rate. Evaluation using Meta traces from seven storage clusters shows that Baleen reduces Peak Disk-head Time (and hence the number of backend hard disks required) by 12% over state-of-the-art policies for a fixed flash write rate constraint. Baleen-TCO, which chooses an optimal flash write rate, reduces our estimated total cost of ownership (TCO) by 17%. Code and traces are available at https://www.pdl.cmu.edu/CILES/.
more » « less
Full Text Available
Kangaroo: Caching Billions of Tiny Objects on Flash

https://doi.org/10.1145/3477132.3483568

McAllister, Sara; Berg, Benjamin; Tutuncu-Macias, Julian; Yang, Juncheng; Gunasekar, Sathya; Lu, Jimmy; Berger, Daniel S.; Beckmann, Nathan; Ganger, Gregory R. (October 2021, Symposium on Operating Systems Principles)
null (Ed.)
Full Text Available
The CacheLib Caching Engine: Design and Experiences at Scale

Berg, Ben; Berger, Daniel; McAllister, Sara; Grosof, Isaac; Gunasekar, Sathya; Lu, Jimmy; Uhlar, Michael; Carrig, Jim; Beckmann, Nathan; Harchol-Balter, Mor; et al (November 2020, 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020))
null (Ed.)
Web services rely on caching at nearly every layer of thesystem architecture. Commonly, each cache is implementedand maintained independently by a distinct team and is highlyspecialized to its function. For example, an application-datacache would be independent from a CDN cache. However, thisapproach ignores the difficult challenges that different cachingsystems have in common, greatly increasing the overall effortrequired to deploy, maintain, and scale each cache.This paper presents a different approach to cache devel-opment, successfully employed at Facebook, which extractsa core set of common requirements and functionality fromotherwise disjoint caching systems.CacheLibis a general-purpose caching engine, designed based on experiences witha range of caching use cases at Facebook, that facilitates theeasy development and maintenance of caches. CacheLib wasfirst deployed at Facebook in 2017 and today powers over 70services including CDN, storage, and application-data caches.This paper describes our experiences during the transitionfrom independent, specialized caches to the widespread adop-tion of CacheLib. We explain how the characteristics of pro-duction workloads and use cases at Facebook drove importantdesign decisions. We describe how caches at Facebook haveevolved over time, including the significant benefits seen fromdeploying CacheLib. We also discuss the implications our ex-periences have for future caching design and research.
more » « less
Full Text Available
The CacheLib Caching Engine: Design and Experiences at Scale

Berg, Benjamin; Berger, Daniel; McAllister, Sara; Grosof, Isaac; Gunasekar, Sathya; Lu, Jimmy; Uhlar, Michael; Carrig, Jim; Beckmann, Nathan; Harchol-Balter, Mor; et al (January 2020, 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020))

Full Text Available

Search for: All records